Syllable-based Language Models in Speech Recognition for English Spoken Document Retrieval
نویسندگان
چکیده
The spoken content of audio/visual collections such as TV or radio archives is an information resource of enormous potential. The challenge is to develop methods that will make it possible to browse or search these collections. The experimental results presented in this paper demonstrate that syllable-level transcripts provide an important supplement to conventional word-level transcripts for the task of unlimited vocabulary American English spoken document retrieval. Recognition is performed with syllable language models with vocabulary sizes 20k, 10k, 5k, 1k, and 500. The syllable recognition rates of the 10k and 5k models are comparable to that achieved by a baseline 100k word-based language model. A simple retrieval experiment involving a fuzzy full text search supplies proof-of-concept that syllable-based transcripts make it possible to retrieve spoken documents that contain query words not included in the 100k vocabulary of the word-based language model.
منابع مشابه
Generating Phonetic Cognates to Handle Named Entities in English-Chinese Cross-Language Spoken Document Retrieval
We have developed a technique for automatic transliteration of named entities for English-Chinese cross-language spoken document retrieval (CL-SDR). Our retrieval system integrates machine translation, speech recognition and information retrieval technologies. An English news story forms a textual query that is automatically translated into Chinese words, which are mapped into Mandarin syllable...
متن کاملUsing syllable-based indexing features and language models to improve German spoken document retrieval
Spoken document collections with high word-type/word-token ratios and heterogeneous audio continue to constitute a challenge for information retrieval. The experimental results reported in this paper demonstrate that syllable-based indexing features can outperform word-based indexing features on such a domain, and that syllable-based speech recognition language models can successfully be used t...
متن کاملMultimedia fusion in automatic extraction of studio speech segments for spoken document retrieval
This paper describes our progress in Cantonese spoken document retrieval. Over 60 hours of Cantonese television news broadcasts have been collected as part of AoE-IT Multimedia Repository. We have also developed the Multimedia Markup Language (MmML) for annotating the multimedia content in terms of anchor/field video frames and audio recordings. The audio tracks are indexed by a Cantonese sylla...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملProsody-enriched lattices for improved syllable recognition
Automatic recognition of syllables is useful for many spoken language applications such as speech recognition and spoken document retrieval. Short-term spectral properties (such as melfrequency cepstral coefficients, or MFCCs) are usually the features of choice for such systems, which typically ignore suprasegmental (prosodic) cues that manifest themselves at the syllable, word and utterance le...
متن کامل